Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging

نویسندگان

چکیده

Various research approaches have attempted to solve the length difference problem between surface form and base of words in Korean morphological analysis part-of-speech (POS) tagging task. The compound POS method is a popular approach, which tackles using annotation tags. However, dictionary required for post-processing recover dissolve ambiguity tags, degrades system performance. In this study, we propose novel syllable-based multi-POSMORPH within one framework, without post-processing. A tag created by combining tags morpheme syllables simultaneous recovery. model implemented with two-layer transformer encoder, lighter than existing models based on large language models. Nonetheless, experiments demonstrate that performance proposed comparable to, or better than, previous

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syllable-Pattern-Based Unknown-Morpheme Segmentation and Estimation for Hybrid Part-of-Speech Tagging of Korean

Most errors in Korean morphological analysis and part-of-speech (POS) tagging are caused by unknown morphemes. This paper presents a syllable-pattern-based generalized unknownmorpheme-estimation method with POSTAG (POStech TAGger), which is a statistical and rule-based hybrid POS tagging system. This method of guessing unknown morphemes is based on a combination of a morpheme pattern dictionary...

متن کامل

Machine Aided Error-Correction Environment for Korean Morphological Analysis and Part-of-Speech Tagging

Statistical methods require very large corpus with high quality. But building large and faultless annotated corpus is a very difficult job. This paper proposes an efficient method to construct part-of-speech tagged corpus. A rulebased error correction method is proposed to find and correct errors semi-automatically by user-defined rules. We also make use of user's correction log to reflect feed...

متن کامل

Part-of-Speech-Tagging using morphological information

This paper presents the results of an experiment to decide the question of authenticity of the supposedly spurious Rhesus - a attic tragedy sometimes credited to Euripides. The experiment involves use of statistics in order to test whether significant deviations in the distribution of word categories between Rhesus and the other works of Euripides can or cannot be found. To count frequencies of...

متن کامل

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

متن کامل

Syllable-based probabilistic morphological analysis model of Korean

In this paper, we present a syllable-based probabilistic morphological analysis model of Korean. While the previous morphological analyzers that regardmorpheme as a processing unit, the model exploits syllable as a processing unit in order to endure the unknown word problem. Actually, it does not use any morpheme dictionary. In contract to the previous systems that depend on manually constructe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13052892